Add capability security failure taxonomy labels#154
Conversation
There was a problem hiding this comment.
Code Review
This pull request expands the failure taxonomy by introducing four new capability and security-related labels: CAPABILITY_BOUNDARY_LOSS, UNAUTHORIZED_CAPABILITY_PATH, APPROVAL_GATE_LOSS, and POLICY_ENFORCEMENT_GAP. These changes include updates to the documentation, the core taxonomy definition in Python, and the addition of a registration test. Review feedback highlights an inconsistency in the severity_class naming convention compared to existing values and suggests a more idiomatic set-based comparison in the test suite.
| "CAPABILITY_BOUNDARY_LOSS": { | ||
| "operational_meaning": "Reconstructed replay state no longer preserves an explicit capability, resource, or tool boundary present in the original operational state.", | ||
| "observable_trigger": "Capability-boundary replay contract, fixture expectation, or validator reports missing boundary nodes or boundary edges after reconstruction.", | ||
| "contract_or_invariant_type": "capability_boundary", | ||
| "severity_class": "safety", | ||
| "non_goal": "Not a runtime exploitability claim, live access-control verdict, or external security-breach assertion.", | ||
| }, | ||
| "UNAUTHORIZED_CAPABILITY_PATH": { | ||
| "operational_meaning": "Reconstructed replay state introduces an explicit capability, tool, or resource path absent from the original allowed capability boundary.", | ||
| "observable_trigger": "Capability-boundary replay contract, fixture expectation, or validator reports added boundary edges or capability nodes that create a new explicit path.", | ||
| "contract_or_invariant_type": "capability_boundary", | ||
| "severity_class": "safety", | ||
| "non_goal": "Not an intent inference, exploitability judgment, or authorization conclusion derived from prose or unstated policy.", | ||
| }, | ||
| "APPROVAL_GATE_LOSS": { | ||
| "operational_meaning": "Replay reconstruction drops an explicit approval, validation, or human-gate commitment required before a guarded action.", | ||
| "observable_trigger": "Fixture expectation, ordering artifact, capability-boundary artifact, or validator reports that a required approval or validation gate is missing before a guarded action path.", | ||
| "contract_or_invariant_type": "governance_gate", | ||
| "severity_class": "governance", | ||
| "non_goal": "Not a requirement for live human-in-the-loop runtime behavior and not a clinical, legal, or production approval claim.", | ||
| }, | ||
| "POLICY_ENFORCEMENT_GAP": { | ||
| "operational_meaning": "Reconstructed replay state preserves an action or dependency while losing the explicit policy enforcement condition that constrained it.", | ||
| "observable_trigger": "Fixture expectation, policy-order contract, capability-boundary artifact, or validator reports a missing policy or guard condition while the related action path remains present.", | ||
| "contract_or_invariant_type": "policy_enforcement", | ||
| "severity_class": "governance", | ||
| "non_goal": "Not a live policy-engine bypass claim, external compliance assertion, or runtime exploitability determination.", | ||
| }, |
There was a problem hiding this comment.
The new severity_class values "safety" and "governance" are inconsistent with the existing values (critical, high, medium) used in the taxonomy. This could cause confusion as they seem to represent categories rather than severity levels.
To improve clarity and maintainability, please consider mapping these to the existing severity scale. For example:
CAPABILITY_BOUNDARY_LOSSandUNAUTHORIZED_CAPABILITY_PATHseem to becriticalissues.APPROVAL_GATE_LOSSandPOLICY_ENFORCEMENT_GAPseem to behighseverity issues.
If "safety" and "governance" are intended as new classification axes, it might be better to introduce a separate field for them to avoid overloading the meaning of severity_class.
"CAPABILITY_BOUNDARY_LOSS": {
"operational_meaning": "Reconstructed replay state no longer preserves an explicit capability, resource, or tool boundary present in the original operational state.",
"observable_trigger": "Capability-boundary replay contract, fixture expectation, or validator reports missing boundary nodes or boundary edges after reconstruction.",
"contract_or_invariant_type": "capability_boundary",
"severity_class": "critical",
"non_goal": "Not a runtime exploitability claim, live access-control verdict, or external security-breach assertion.",
},
"UNAUTHORIZED_CAPABILITY_PATH": {
"operational_meaning": "Reconstructed replay state introduces an explicit capability, tool, or resource path absent from the original allowed capability boundary.",
"observable_trigger": "Capability-boundary replay contract, fixture expectation, or validator reports added boundary edges or capability nodes that create a new explicit path.",
"contract_or_invariant_type": "capability_boundary",
"severity_class": "critical",
"non_goal": "Not an intent inference, exploitability judgment, or authorization conclusion derived from prose or unstated policy.",
},
"APPROVAL_GATE_LOSS": {
"operational_meaning": "Replay reconstruction drops an explicit approval, validation, or human-gate commitment required before a guarded action.",
"observable_trigger": "Fixture expectation, ordering artifact, capability-boundary artifact, or validator reports that a required approval or validation gate is missing before a guarded action path.",
"contract_or_invariant_type": "governance_gate",
"severity_class": "high",
"non_goal": "Not a requirement for live human-in-the-loop runtime behavior and not a clinical, legal, or production approval claim.",
},
"POLICY_ENFORCEMENT_GAP": {
"operational_meaning": "Reconstructed replay state preserves an action or dependency while losing the explicit policy enforcement condition that constrained it.",
"observable_trigger": "Fixture expectation, policy-order contract, capability-boundary artifact, or validator reports a missing policy or guard condition while the related action path remains present.",
"contract_or_invariant_type": "policy_enforcement",
"severity_class": "high",
"non_goal": "Not a live policy-engine bypass claim, external compliance assertion, or runtime exploitability determination.",
},| "APPROVAL_GATE_LOSS", | ||
| "POLICY_ENFORCEMENT_GAP", | ||
| } | ||
| missing = sorted(label for label in expected_labels if label not in FAILURE_TAXONOMY) |
There was a problem hiding this comment.
The logic to find missing labels can be expressed more concisely and idiomatically using set operations. This is also generally more performant for large collections.
| missing = sorted(label for label in expected_labels if label not in FAILURE_TAXONOMY) | |
| missing = sorted(expected_labels - FAILURE_TAXONOMY.keys()) |
Motivation
Description
FAILURE_TAXONOMYinsrc/validation/failure_taxonomy.py:CAPABILITY_BOUNDARY_LOSS,UNAUTHORIZED_CAPABILITY_PATH,APPROVAL_GATE_LOSS, andPOLICY_ENFORCEMENT_GAP, each including the required fieldsoperational_meaning,observable_trigger,contract_or_invariant_type,severity_class, andnon_goal.docs/failure_taxonomy.mdthat documents deterministic, fixture-bound semantics and evidence expectations for these labels.test_capability_security_expansion_labels_are_registeredintests/test_failure_taxonomy.pyasserting the four labels are present, while preserving existing generic taxonomy tests and banned-term checks.Testing
pytest tests/test_failure_taxonomy.py -qand the new/updated taxonomy tests passed.pytest tests/test_fixture_manifest.py -qand it passed, verifying fixture manifests remain compatible with the taxonomy.npm run checkwhich completed successfully (full test suite executed and passed), validating typechecks, builds, and broader test coverage.Codex Task